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We show how to efficiently enumerate a class of finite-memory stochastic processes using the 
causal representation of e-machines. We characterize e-machines in the language of automata theory 
and adapt a recent algorithm for generating accessible deterministic finite automata, pruning this 
over-large class down to that of e-machines. As an application, we exactly enumerate topological 
e-machines up to eight states and six-letter alphabets. 
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I. INTRODUCTION 

What does the landscape of stochastic processes look 
like? Some classes of process — e.g., modeled by Markov 
chains and Hidden Markov models, finite or denumerable 
— are familiar to us since they have proven so useful 
as models of randomness in real world systems. Even if 
this familiarity belies a now-extensive understanding for 
particular classes, it begs the question of the intrinsic or- 
ganization and diversity found in the space of all stochas- 
tic processes. Randomly selecting a stochastic process, 
how often does one find that it saturates the entropy 
rate? How many distinct processes are there at a given 
entropy rate or with a given number of states? Answers 
to these and related questions will go some distance to 
understanding the richness of stochastic processes and 
these, in turn, will provide hints as to what is possible in 
nature. 

Stochastic processes show up in an exceedingly wide 
range of fields, but they are not generally analyzed or 
classified in broad swaths. In an attempt to address 
such concerns, we show how to enumerate the class of 
stochastic processes that admit the causal representation 
of finite-state e-machines. 

An e-machine is the minimally complex, maximally 
predictive representation that completely captures all of 
a stochastic process's information storage and process- 
ing properties. The e-machine representation allows for 
direct analysis of the underlying process using only rele- 
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vant information, and it provides a framework for com- 
paring different processes through common, measurable 
quantities. The literature on computational mechanics 
[S], the area responsible for the theory of e-machincs, 
provides details about the construction of e-machines 
from process output, proof of their optimality, various 
information-theoretic quantities that can be calculated 
from the e-machine, and more. 

Here, we consider stationary stochastic processes over 
discrete states and discrete alphabets. Given that 
each such process can be completely represented by its 
e-machinc, to enumerate all stochastic processes it suf- 
fices to enumerate all e-machines. Even if one restricts 
to the case of e-machines with finitely many states, this 
task appears to be extraordinarily difficult. So, as a 
first step, we enumerate a subclass of e-machines called 
topological e-machines, which represent a subclass of all 
finite-memory processes. In a sequel, we extend the ideas 
presented here to more general stochastic processes and 
their e-machincs. 

Although we are a long way from mapping the land- 
scape of all stochastic processes, enumerating a subclass 
of finite-memory stochastic processes is useful for a num- 
ber of reasons. First is basic understanding. One would 
simply like to know how many processes there are for a 
given number of states and alphabet size. Moreover, if 
we fix one of these parameters and increase the other, it 
is informative to see how the number of distinct processes 
scales as well. Second, it allows for a thorough survey of 
process characteristics. An example of a such a survey is 
found in Ref. [BJ. Third, an enumerated list of processes 
can be used to rigorously establish properties for various 
kinds of complex systems. A library like this was used 
in Refs. [7] and [5] to prove theorems about pattern for- 
mation in cellular automata. Finally, and rather more 
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generally, one needs to be able to sample and explore the 
space of processes in a random or a systematic way, such 
as required in Bayesian inference [9]. 

Starting from an algorithm initially designed to enu- 
merate deterministic finite automata, we use e-machinc 
properties as a selection criteria for these automata, re- 
sulting in the set of topological e-machines (and the pro- 
cesses they describe) as a result. Our development of 
this is organized as follows. First, we briefly discuss our 
previous approach to this problem using a different or- 
derly enumeration algorithm due to Read [TU], followed 
by an overview of the algorithm on which our enumer- 
ation scheme is based [UJ. Second, we lay out the ma- 
chinery of this algorithm, reviewing automata theory and 
computational mechanics. We define the necessary con- 
cepts as they apply to topological e-machine generation 
and enumeration. Third, we then describe our algo- 
rithm, give pseudocode for its implementation, and prove 
that it successfully enumerates all topological e-machines. 
Fourth, we present enumeration results as a function of 
the number of states and symbols. We discuss, as well, 
the performance of the new algorithm, comparing it to 
our previous algorithm, and explain the improvements. 



II. RELATED WORK 

The enumeration of e-machines has not, to our knowl- 
edge, been previously explored, outside of the above-cited 
works. The enumeration of certain classes of DFAs, in 
contrast, has been pursued with varying degrees of suc- 
cess. Of particular interest, strongly connected and min- 
imal complete finite automata were separately enumer- 
ated in Refs. |Hj and [13], respectively. See Ref. [TT] 
and references therein for more details on other recent 
efforts. 

Much of the literature on computational mechanics 
focuses on e-machines from the standpoint of Markov 
chains and stochastic processes and, therefore, typically 
uses the transition matrices as an e-machine's represen- 
tation. Our first approach for enumerating finitary pro- 
cesses focused on generating all possible transition ma- 
trices and, hence, all e-machines, interpreted as labeled 
directed graphs. Read [10] presented an orderly genera- 
tion algorithm that could be used to efficiently generate 
certain classes of combinatorial objects. Among the ob- 
jects that can be generated are directed and undirected 
graphs, rooted trees, and tournaments (interpreted as a 
special class of directed complete graphs). The essence 
of Read's algorithm is that, given the complete list C m 
of graphs with n nodes and m edges, we can construct 
the complete list C m +\ of graphs with n nodes and 
m + 1 edges without having to run an isomorphism check 
against each of the already constructed graphs. This of- 
fers a significant speed improvement versus the classical 
method. 

We initially adapted Read's algorithm to generate all 
edge-labeled multi-digraphs (with loops). From this ex- 



tensive list, we then eliminated graphs that were not 
strongly connected and minimal in the sense of finite au- 
tomata theory. While this algorithm was successful, it 
had three main performance drawbacks: 1) A large mem- 
ory footprint, as C m must be stored to generate C m +i', 2) 
an improved, but still extensive, isomorphism check for 
each generated graph — the worst-case scenario requires 
n\ comparisons for each generated graph; and 3) genera- 
tion of a substantially larger class than needed and, as a 
consequence, many graphs to eliminate. 

Our second approach, and the one presented in detail 
here, uses a different representation of e-machines, look- 
ing at them as a type of deterministic finite automata 
(DFA). The new algorithm suffers from none of the pre- 
vious method's problems. Although, it should be noted 
that this method cannot be used to enumerate the gen- 
eralized structures available via Read's algorithm. 

In his thesis, Nicaud [15] discussed the enumeration of 
"accessible" DFAs restricted to binary alphabets. These 
results were then independently extended to fc-ary al- 
phabets in Refs. [16] and [17] . Recently, Almeida et 
al. [TT] developed an algorithm that generates all pos- 
sible accessible DFAs with n states and fc symbols us- 
ing a compact string representation initially discussed in 
Ref. [15] . They showed that considering the "skele- 
ton" of these DFAs as fc-ary trees with n internal nodes 
guarantees that a DFA's states are all accessible from a 
start state. From there, they procedurally add edges to 
the tree in all possible ways to generate all DFAs. As 
it is possible to generate all such trees, they show that 
it is possible to generate all accessible DFAs. They con- 
tinue on to discuss their enumeration in comparison to 
the methods of Refs. [16] and [TT], as well as giving a 
brief commentary on the percentage of DFAs that are 
minimal for a given number of states and symbols. 

III. AUTOMATA REPRESENTATIONS 

We start with notation and several definitions from 
automata theory [19] that serve as the basis for the algo- 
rithm. 

Definition. A deterministic finite automaton is a tuple 
(Q, E, S, qo, F), where Q is a finite set of states, E is a 
discrete alphabet, 8 : Q x E — > Q is the transition func- 
tion, go is the start state, and F C Q is the set of final 
(or accepting) states. 

We extend the transition function in the natural way, 
with 5(q, A) = q, for all q G Q, and for v, v' € 
E, 5(q,vv') — 5(5(q,v),v'). Here, A denotes the empty 
word. 

With \Q\ = n and |E| = fc, we take our set of states 
to be Q — {0, . . . , n — 1} and our alphabet to be E = 
{0, . . . , fc — 1}. When context alone is not clear, states 
and symbols will be denoted by qi and Vj, respectively. 
We will use F = Q (all states are accepting) for our 
algorithm, although this is not a general characteristic of 



DFAs, but is a property of e-machines. 

Definition. A DFA is complete if the transition function 
5 is total. That is, for any state q € Q and symbol 
d£S, 5(q, v) — q' for some q' G Q. 

The DFAs generated by the Almeida et al algorithm 
may be incomplete [llj . Shortly, we will see this is 
a necessary condition for the DFA to be a topological 
e-machine. 

Definition. Two states, q and q' , of a DFA are said to 
be equivalent if for all words w G £*, 5(q, w) € F if and 
only if S(q',w) € F. That is, for every word iu, following 
the transitions from q and q' both lead to accepting or 
nonaccepting states. A DFA is minimal if there are no 
pairwise equivalent states. 

As we take F = Q for e-machines, we can simplify the 
idea of equivalence somewhat. Two states of a topological 
e-machine are equivalent if the sequences following each 
state are the same. 

Definition. A DFA is accessible or initially connected if 
for any state q G Q, there exists a word w G S* such that 
S(q ,w) = q. 

Simply put, there is a directed path from the initial 
state to any other state. The reverse is not necessarily 
true. 

Definition. A DFA is strongly connected if for any two 

states q, q' G Q, there is a word w G S* such that 
S(q,w) = q 1 . Equivalently, for any state q G Q, setting 
qo = q results in the DFA still being accessible. 

Definition. Two DFAs are isomorphic if there is a one- 
to-one map between the states that 1) maps accepting 
and nonaccepting states of one DFA to the correspond- 
ing states of the other, 2) preserves adjacency, and 3) 
preserves edge labeling when applied to 5. 

Definition. A finite e-machine is a probabilistic finite- 
state machine with a set of causal states {ooj ■ ■ ■ , fn-i}, 
a finite alphabet {vq, . . . , Vk-i}, and transition proba- 
bilities on the edges between states, given by a set of 
transition matrices TW, i G {0, . . . , k - 1}. Given the 
current state, a transition is determined by the output 
symbol. States are probabilistically distinguishable, so 
the e-machine is minimal. 

An e-machine has transient and recurrent components, 
but we only focus on the recurrent portion, as the tran- 
sient component can be calculated from the recurrent. In 
the following, when we talk about e-machines, we implic- 
itly refer to the recurrent states. With this restriction, 
e-machines are also strongly connected. 

Figure [I] gives the e-machine for the Even Process |20| . 
The Even Process produces binary sequences in which all 
blocks of uninterrupted Is are even in length, bounded by 
0s. Furthermore, after each even length is reached, there 
is a probability p of breaking the block of Is by inserting 
a 0. If a is inserted, then the same rule applies again. 
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FIG. 1: Even Process. The transition labels denote the prob- 
ability p G (0, 1) of generating symbol x as p\x. 

Definition. A topological e-machine is an e-machine 
where the transition probabilities from a single state are 
uniform across all outgoing edges. 

The topological e-machine for the Even Process is given 
in Fig. [2j We see that the transitions on both edges 
leaving state A have probability 1/2, instead of p and 
1— p as they were in the original Even Process e-machine. 




FIG. 2: Topological e-machine for the Even Process. Transi- 
tion probabilities are uniform across edges leaving state A. 

Since the transition probabilities are uniform across 
all edges leaving each single state, we only need to know 
their number. As far as the enumeration algorithm is 
concerned, we may effectively ignore the probabilities and 
focus instead on where the edges go. 

This makes clear the name topological e-machine: We 
are only interested in the topological structure (connec- 
tivity or adjacency) as this determines all its other prop- 
erties. 

One of the key reasons for the success of the algorithm 
is its compact representation of DFAs which allows for 
direct enumeration. Recall that |S| = k and suppose 
that there is a fixed ordering 0, . . . , n — 1 on the states 
Q. 

Definition. A DFA's string S = [t , ti, . . . , t n k-i] is an 
nfc-tuple that specifies the terminal state ti G Q on each 
outgoing edge. The first k entries in the string corre- 
spond to the states reached by following the edges labeled 
0, . . . , k — 1 that start in state 0. The next k AjS corre- 
spond to the edges that start in state 1 and so on. Thus, 
for each of the n states, there are k specified transitions. 
If an outgoing edge does not exist, the corresponding in- 
dex is marked with ti = — 1. 

For clarity, let's consider the topological e-machine for 
the Even Process. Let states A and B be denoted by 
and 1, respectively. The transition symbols will also 
be and 1, though there is no connection between the 
two labellings. As A transitions to A on a and to B 
on a 1, the terminal states for these two transitions are 
and 1, respectively. B has no outgoing transition on 
symbol 0, so that will be denoted —1 in the string, while 
the transition from B to A on a 1 will be given by 0. 
Thus, the string representation for the Even Process is 
S= [0,1,-1,0]. 
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In the definition of a DFA's string, we assumed a fixed 
ordering on the states. In general, there are n! ways to 
label the states and as many strings, so we need a way 
to fix a labeling unambiguously. To do this, we label 
the states in the order in which they are reached by fol- 
lowing edges lexicographically from state qo- Start with 
<7o = 0, then follow the edges coming out of go m order: 
0,1, . . . , k — 1. The first state reached that is not state 
is labeled as 1. The next state that is not or 1 be- 
comes state 2, and so on. Once the edges 0, . . . , k — 1 
have been explored, the procedure is repeated, starting 
from state 1, then state 2, and so on — until all the states 
have been labeled. Given the initial state qo of an acces- 
sible DFA, the edges uniquely determine the labeling of 
all the other states in the DFA. A proof can be found in 
Ref. [TT]. Note that the DFA must be accessible for this 
to work, else states will be missed in the labeling process. 

Definition. Given a DFA string S, the corresponding 
flag f = [fo, fu ■ ■ ■ i fn] is an n + 1 tuple, with f = -1, 
/„ = nk, and /j = min{j : Sj = i}. That is, /j is the 
index of the first occurrence of i in the DFA string S. 
Note that as the DFA is accessible, < ik — 1. 

The flag for the Even Process shown above is [—1,1,4]. 

IV. ENUMERATION ALGORITHM 

To enumerate and generate all topological e-machincs, 
we begin with the Almeida et al algorithm 11] that gener- 
ates all accessible DFAs, of which topological e-machines 
are a subclass. We then eliminate those DFAs that are 
not e-machines. The following Lemmas help with this 
process. 

Lemma 1. A topological e-machine with n states has at 
least n transitions. 

Proof. Assume there are at most n— 1 transitions. Then 
there is at least one state with no outgoing transition. 
There is no path from this state to any other state, so this 
cannot be an e-machine, as it is not strongly connected. 

□ 

Lemma 2. A topological e-machine with n > 1 states 
and alphabet size k can have at most nk — 1 transitions. 

Proof. The number of transitions is at most nk, as each 
state can have at most k transitions. Suppose that an 
e-machine has nk transitions. Then every word w G S* is 
accepting for every state, so all states are pairwise equiv- 
alent. This cannot be an e-machine, since it is not mini- 
mal. Thus, there are at most nk — 1 transitions. □ 

This establishes our earlier claim that topological 
e-machines are incomplete. 

Lemma 3. A topological e-machine with n states has n 
isomorphic string automata representations. 



Proof. An e-machine is strongly connected. In the above 
definition of a strongly connected DFA, we gave an equiv- 
alent characterization where any state may serve as qo 
and result in an accessible DFA. As state qo determines 
the labeling of the states, and so the string representa- 
tions, there are exactly n such representations. □ 

We now need to determine the canonical representa- 
tion for a given topological e-machine. Given the n dif- 
ferent strings that all represent the e-machine equally 
well, which do we add to our enumerated list, and how 
do we know if we already have some isomorphism of an 
e-machine on our list? 

A closed-form expression to exactly count the number 
B\ k of incomplete, accessible DFAs with n states and 
alphabet size k was developed in Ref. [TT]. A bijection 
between the integers 0, . . . , k — 1 and the DFAs gener- 
ated by the algorithm was also given. In this way, we can 
determine the i th DFA generated by the algorithm and 
likewise, given an arbitrary accessible DFA, we can deter- 
mine exactly where in the generation sequence it occurs. 
This bijection allows us to easily determine whether an 
e-machine is the canonical representation for its isomor- 
phism class. We denote by B\ k (S) the index of the string 
representation S in the enumeration process. Appendix 
[A] gives the details. 

Definition. Given the n different string representations 
of a topological e-machine — Si, S2, ■ ■ ■ , S n — the canoni- 
cal representation S is the string with the smallest B\ k 
value. It is the first of the isomorphisms generated by 
the enumeration process: 

S = min B 1 k {S l ) . 

l<i<n 

With this definition of a canonical representation, it 
is simple to determine whether a given e-machine has 
already been generated: Compute the index k (S) of 
its representation 5*. Take each state as qo and compute 
the new string representation. If any of the resulting 
representations has a lower index than the original, then 
the given e-machine is not canonical. So, we ignore it and 
generate the next DFA in the enumeration sequence. 

To solidify the above ideas, consider the topological 
e-machine in Fig. [3j Note that since transition probabili- 
ties are not relevant to the enumeration process, we omit 
them entirely and only show the output symbol. Also, 
note that we label our states with letters, not numbers, 
for clarity. 

Depending on the choice of qo, there are 3 different 
representations of this e-machinc: 

1. q = A: 

To determine the state ordering, we follow the edge 
labeled and get qi = B. We follow the edge 
labeled 2 from state B to get (72 = C. In this way we 
identify (A, B, C) as (0, 1, 2) and obtain the string 
representation Si = [1, 2, 0, 0, —1, 2, —1, 0, 2]. From 
this, we compute that B* k (Si) = 70791. 
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FIG. 3: Arbitrary topological e-machine with 3 states over 
alphabet of size 3. 

2. q = B: 

We find that q\ — A and q 2 — C . So, we iden- 
tify (A, B,C) as (1,0,2) and determine that 5 2 
J- 1.2.(12. 1. 1.1.2 . This yields B x n k (S 2 ) = 
55115. 

3. q a = C: 

We identify (A, B, C) = (1, 2, 0), finding that S 3 = 
[-1,1,0,2,0,1,1,-1,0] and B^ k (S 3 ) = 18977. 

All three strings are valid representations of the 
e-machine, but the third S3 has the lowest index (18977) 
in the enumeration sequence, so it is the canonical rep- 
resentation of the e-machine. During the enumeration 
process the other two representations would be ignored 
after it was determined they were noncanonical. 

With this information in-hand, we can now provide the 
pseudocode for our algorithm. For clarity of discussion, 
we break the algorithm into two pieces. The first gener- 
ates accessible DFAs, while the second tests to see if they 
are topological e-machines. 

We only highlight the important aspects of the DFA 
generation algorithm here. For a more complete discus- 
sion, as well as code for implementation, see Ref. [TTj . 
Algorithm 1. DFA Generation 

Input: Number of states n, alphabet size k. 

1. Generate the flags in reverse lexicographic order. 

2. For each flag: 

(a) Generate strings with this flag one at a time, 
in lexicographic order. Each is generated from 
the previous. 

(b) Test the DFA string S to see if it is a canonical 
topological e-machine. (See Algorithm [2j) 

(c) If the DFA is canonical, output B\ k [S) to the 
list of topological e-machines. 

(d) Move to next flag when all strings have been 
generated. 

3. Terminate after last string for last flag has been 
generated. 



Output: The list of indices {B^ k (S)} of all topo- 
logical e-machines for the given n and k. 

Algorithm 2. Test for topological e-machine 

Input: DFA X in string representation S and 

B i,k( s )- 

1. Reject X unless it has at least n transitions. 

2. Reject X if it has nk transitions. 

3. For i = 1, . . . , n — 1: 

(a) Create a new DFA Yi from DFA X with q a = i. 

(b) Reject X if the states of Yi cannot be labeled 
by follow edges lexicographically from q . 

(X is not strongly connected.) 

(c) Build string Si for Yf. 

(d) Compute index B* k (Si). 

(e) Reject X if B^Si) < B^S). 
(X is not canonical.) 

4. Reject X if it is not a minimal DFA. 

Output: True or False, whether the input DFA is a 
canonical representation of a topological e-machine. 

Note that steps 1 and 2 are not formally necessary for 
the algorithm to work, as any DFA that fails these will 
be not strongly connected and nonminimal, respectively. 
However, it is quicker to perform these tests than it is to 
check for connectedness or minimality, and it is for these 
reasons that Lemmas Q] and [2] were mentioned. 

Proposition 1. The above algorithm generates all topo- 
logical e-machines with n states and k symbols. 

Proof. It was already shown in Ref. [11] that the original 
algorithm generates all accessible DFAs with n states and 
k symbols. We need only show that our additions result 
in only topological e-machines being generated. 

As stated previously, topological e-machines are min- 
imal and strongly connected. We also require a single 
representative of an isomorphism class. We check that 
we only get strongly connected DFAs in step 3(6), and 
we get minimality from step 4. Finally, we prune isomor- 
phisms with the test in step 3(e). □ 

See Ref. [19] for details on the minimization algorithm 
used here. Also, note that we are not interested in the 
minimal DFA itself, only whether the given DFA is min- 
imal. We minimize the automaton and accept it if it has 
the same number of states as the original. 

Note that the order of the above checks for connect- 
edness, minimality, and isomorphic redundancy can be 
changed, but the performance of the algorithm suffers. 
The minimization algorithm is the slowest step, so it 
should be performed as few times as necessary, which 
is why it appears last. 
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TABLE I: The number E n ,2 of binary-alphabet topological 
e-machines as a function of the number of states (n) and edges 
(fc). The number B^ i2 of accessible binary DFAs is listed for 
comparison. 



V. RESULTS 

We ran the algorithm on a range of n and k values. To 
date, the majority of work in computational mechanics 
focused on binary alphabets, so we provide not only the 
number E n ^ of e-machines with a binary alphabet, but 
also a breakdown by the number of edges (transitions) 
for a given number of states in Table [T] 

Looking at the numbers in the table, we see that the 
number of e-machines increases quite rapidly, but when 



compared to the total number B\ 2 of accessible binary 
DFAs, the ratios decrease. At n = 3, 9.6% of all acces- 
sible DFAs were topological e-machines; while at n = 8, 
that ratio was already down to 3.8%. We also see that for 
any given number of states, the majority of e-machines 
have the maximum number of possible edges. This is not 
surprising as a DFA is more likely to be strongly con- 
nected with more edges present. 

We note that E n ^ is now listed on the On-Line Ency- 
clopedia of Integer Sequences as sequence A181554 |21j . 

We can certainly consider larger alphabets, and Table 
[IT] provides the number E n ^ of e-machines for a given 
number of states n and alphabet size k. 

Using the data in Table [TT] we again consider the ra- 
tios of E n k/B^ k . Looking at 2-state machines with an 
increasing alphabet, the ratio quickly approaches 1/2, in- 
dicating that almost every accessible DFA with 2 states is 
a topological e-machine. (Recall that half of all machines 
are noncanonical isomorphisms.) 

Although data is lacking to make a definitive conclu- 
sion, there is also a trend that the number of e-machines 
increases more rapidly with increasing states (at large al- 
phabet) than with increasing alphabet size. This agrees 
with how the number of accessible DFAs grows given 
these two conditions, but we need more data to be sure. 

At this point, we need to address two types of over- 
counting that appear in Table [TTJ The first occurs due 
to multiple representations of a process using a larger al- 
phabet. For example, all machines over I > 2 letters are 
also machines over k letters for k > I. In fact, there are 
(j) representations for each Z-ary machine in the A-ary 
library. One may be more interested, however, in new 
structural features and process characteristics that ap- 
pear with a larger alphabet rather than the number of 
ways we can re-represent machines with smaller alpha- 
bets. As such, Table III provides the number F n ^ of 
topological e-machines that employ all k letters. These 
machines cannot be found for smaller k and are, thus, 
"new" due to the larger alphabet. 

The second type of overcounting is due to symbol iso- 
morphism. Certain processes listed in both Tables [TTJ 
and |III| have multiple representations that are different 
as e-machines but have the same characteristics — for ex- 
ample, when quantified using information-theoretic mea- 
sures of complexity. The Even Process, to take one ex- 
ample, can be considered as having even-length blocks of 
Is, as depicted in Fig.[2j or even-length blocks of 0s. The 
measurable process characteristics are the same for these 
two processes. We include both in our list, as the num- 
bers are of interest to those studying finite-state trans- 
ducers, as one example. 

We also note that Tables [TT] and |III| are incomplete. 
This is not a shortcoming of the algorithm, but rather a 
comment on the exploding number of e-machines. Look- 
ing only at the binary alphabet e-machines, we see that 
their numbers increase very rapidly. 

Looking at the generation times for binary alphabet 
machines in Table IV we see that the run times increase 
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TABLE II: The number E n ^ of topological e-machines as a 
function of number of states n and alphabet size k. 
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TABLE III: The number F n ± of full-alphabet topological 
e-machines as a function of number of states n and alpha- 
bet size k. 

very rapidly also. Our estimate for 9-state binary ma- 
chines is approximately 35 CPU days. Naturally, since 
they depend on current technology, the absolute times are 
less important than the increasing ratios of run times. 

VI. APPLICATIONS 

Computational mechanics considers a number of dif- 
ferent properties — including the entropy rate, statistical 
complexity, and excess entropy — to quantify a process's 
ability to store and transform information [5]. Addi- 
tionally, there are known bounds on a number of these 
quantities as well as generalizations of e-machines that 
achieve these bounds; e.g., see the binary e-machine sur- 
vey in Ref. [5] . However, little is known about the nonbi- 
nary alphabet case and about other more recently intro- 
duced quantities, such as causal irreversibility and cryp- 
ticity [22 . A survey of the intrinsic Markov order and 
the cryptic order for 6-state e-machines recently appeared 
in Ref. [23]. A series of sequels will provide additional 
surveys — all of which depend on the e-machine libraries 
we have shown how to construct. 

Beyond this kind of fundamental understanding of the 
space of stochastic processes and the genericity of prop- 
erties, e-machine enumeration has a range of practical 
applications. One often needs to statistically sample rep- 
resentations of finite-memory stochastic processes and a 
library of e-machines forms the basis of such sampling 
schemes. In the computational mechanics analysis of 
spatiotemporal patterns in spatial dynamical systems, 
e-machines play the role of representing spacetime shift- 
invariant sets of configurations. The library can then be 
used in computer-aided proofs of the domains, particles, 
and particle interactions that are often emergent in such 



time (seconds) 
1.00 x 10~ 2 
1.30 x 10~ 2 
2.75 x 10" 1 
1.39 x 10 1 
7.80 x 10 2 
4.94 x 10 4 



TABLE IV: Average run times (2.4 GHz Intel Core 2 Duo 
CPU) to generate all binary alphabet topological e-machines 
as a function of the number n of states. 

systems, as done in Ref. [5]. Finally, in Bayesian statis- 
tical inference from finite data, priors over the space of 
e-machines are updated based on the evidence the data 
provides. Applications along these lines will appear else- 
where. 



VII. CONCLUSION 

Beginning with an algorithm for enumerating and gen- 
erating accessible DFAs, we showed how to enumerate all 
topological e-machines based on the fact that they are 
strongly connected and minimal DFAs, discounting for 
isomorphic redundancies along the way. 

There are a number of open problems and extensions 
to the algorithm and enumeration procedure to consider. 
Ideally, we would like to modify this algorithm, or create 
an altogether new one, that directly generates topological 
e-machines without having to generate a larger class of 
objects — counted via k — that we then prune. Failing 
this, at least we would like to generate a smaller class of 
DFAs, perhaps only those that are strongly connected, 
so that fewer candidate DFAs need be eliminated. 

We would also like to find a closed-form expression for 
the number of topological e-machines for a given n and 
k. If this is not possible, we would like reasonable upper 
bounds on this quantity (better than B n ^) and, perhaps, 
asymptotic estimates of the number of accessible DFAs 
that are actually topological e-machines. Along these 



lines, we conjecture that for fixed fc, lim E n ^jB\ k = 
and, for fixed n, lim E n ^lB x n k = l/n. 



Appendix A: String-index mapping 

Let S be some DFA string representation, and let / 
be the flag corresponding to S. Then we have B\ k (s) = 
rif + n r , where: 
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Equation (All calculates the first index that uses the 
given flag, and Eq. ( A2 1 calculates the index of the string 



S amo ng t hose DFAs with the given flag. 

Eq. (Al) refers to the number Nj t of accessible DFAs 
whose string representation has the first occurrence of 
symbol j occur in position I. It can be defined by a 
recursive formula and its values stored in a table for ef- 
ficient access. For completeness we provide the formulas 
here, but for more detail we direct the reader to Ref. [IT] : 

Ni_ hj = (n + j € [ n - 2 ,(n- l)k - 1] 

fe-i 

NL,mk-l =J2( m + 2 Y N ™+hrnk +l ,»»€ [1,71-2] 
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